Research Interests – Alin Dobra
نویسندگان
چکیده
The recent explosion of the Internet and the rapid technological advances in gathering and storing information have resulted in huge amounts of data being collected at a very rapid rate. Developing ways to extract relevant information from such large amounts of data in a human comprehensible form and at the same time in a timely and cost effective way is of great practical importance. Approximate Query Processing and Data-mining, both concerned with extracting useful knowledge from large amounts of data but using different premises, have been the subject of my research as a PhD candidate. My work, so far, is in most part of theoretical nature but the problems I attacked have direct practical applicability. For me theory is only a tool, albeit a very effective one for gaining interesting insights into the problem, but not an end in itself; I have always accompanied it by implementation and empirical validation. In what follows I will describe in some detail my particular interests in these two areas, pointing out past, current and future work.
منابع مشابه
The DBO Database System (598)
We demonstrate our prototype of the DBO database system. DBO is designed to facilitate scalable analytic processing over large data archives. DBO’s analytic processing performance is competitive with other database systems; however, unlike any other existing research or industrial system, DBO maintains a statistically meaningful guess to the final answer to a query from start to finish during q...
متن کاملProbabilistic Characterization of Decision Trees Probabilistic Characterization of Decision Trees
In this paper we use the methodology introduced in Dhurandhar and Dobra (2006) for analyzing the error of classifiers and the model selection measures, to analyze decision tree algorithms. The methodology consists of obtaining parametric expressions for the moments of the Generalization error (GE) for the classification model of interest, followed by plotting these expressions for interpritabil...
متن کاملInsights into Cross-validation
Cross-validation is one of the most widely used techniques, in estimating the Generalization Error of classification algorithms. Though several empirical studies have been conducted, to study the behavior of this method in the past, none of them clearly elucidate the reasons behind the observed behavior. In this paper we study the behavior of the moments (i.e. expected value and variance) of th...
متن کاملDistribution free bounds for relational classi cation
Statistical Relational Learning (SRL) is a sub-area in Machine Learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.) { as is generally assumed. For the traditional i.i.d. setting, distribution free bounds exist, such as the Hoe ding bound, which are used to provide con dence bounds on the ge...
متن کامل